69 research outputs found
Contextual Parameter Generation for Universal Neural Machine Translation
We propose a simple modification to existing neural machine translation (NMT)
models that enables using a single universal model to translate between
multiple languages while allowing for language specific parameterization, and
that can also be used for domain adaptation. Our approach requires no changes
to the model architecture of a standard NMT system, but instead introduces a
new component, the contextual parameter generator (CPG), that generates the
parameters of the system (e.g., weights in a neural network). This parameter
generator accepts source and target language embeddings as input, and generates
the parameters for the encoder and the decoder, respectively. The rest of the
model remains unchanged and is shared across all languages. We show how this
simple modification enables the system to use monolingual data for training and
also perform zero-shot translation. We further show it is able to surpass
state-of-the-art performance for both the IWSLT-15 and IWSLT-17 datasets and
that the learned language embeddings are able to uncover interesting
relationships between languages.Comment: Published in the proceedings of Empirical Methods in Natural Language
Processing (EMNLP), 201
Enhancing Textbooks with Visuals from the Web for Improved Learning
Textbooks are the primary vehicle for delivering quality education to
students. It has been shown that explanatory or illustrative visuals play a key
role in the retention, comprehension and the general transfer of knowledge.
However, many textbooks, especially in the developing world, are low quality
and lack interesting visuals to support student learning. In this paper, we
investigate the effectiveness of vision-language models to automatically
enhance textbooks with images from the web. Specifically, we collect a dataset
of e-textbooks from one of the largest free online publishers in the world. We
rigorously analyse the dataset, and use the resulting analysis to motivate a
task that involves retrieving and appropriately assigning web images to
textbooks, which we frame as a novel optimization problem. Through a
crowd-sourced evaluation, we verify that (1) while the original textbook images
are rated higher, automatically assigned ones are not far behind, and (2) the
choice of the optimization problem matters. We release the dataset of textbooks
with an associated image bank to spur further research in this area.Comment: 17 pages, 27 figure
Understanding Arithmetic Reasoning in Language Models using Causal Mediation Analysis
Mathematical reasoning in large language models (LLMs) has garnered attention
in recent research, but there is limited understanding of how these models
process and store information related to arithmetic tasks. In this paper, we
present a mechanistic interpretation of LLMs for arithmetic-based questions
using a causal mediation analysis framework. By intervening on the activations
of specific model components and measuring the resulting changes in predicted
probabilities, we identify the subset of parameters responsible for specific
predictions. We analyze two pre-trained language models with different sizes
(2.8B and 6B parameters). Experimental results reveal that a small set of
mid-late layers significantly affect predictions for arithmetic-based
questions, with distinct activation patterns for correct and wrong predictions.
We also investigate the role of the attention mechanism and compare the model's
activation patterns for arithmetic queries with the prediction of factual
knowledge. Our findings provide insights into the mechanistic interpretation of
LLMs for arithmetic tasks and highlight the specific components involved in
arithmetic reasoning
Learning the String Partial Order
We show that most structured prediction problems can be solved in linear time
and space by considering them as partial orderings of the tokens in the input
string. Our method computes real numbers for each token in an input string and
sorts the tokens accordingly, resulting in as few as 2 total orders of the
tokens in the string. Each total order possesses a set of edges oriented from
smaller to greater tokens. The intersection of total orders results in a
partial order over the set of input tokens, which is then decoded into a
directed graph representing the desired structure. Experiments show that our
method achieves 95.4 LAS and 96.9 UAS by using an intersection of 2 total
orders, 95.7 LAS and 97.1 UAS with 4 on the English Penn Treebank dependency
parsing benchmark. Our method is also the first linear-complexity coreference
resolution model and achieves 79.2 F1 on the English OntoNotes benchmark, which
is comparable with state of the art.Comment: 12 page
Deep Clustering of Text Representations for Supervision-free Probing of Syntax
We explore deep clustering of text representations for unsupervised model
interpretation and induction of syntax. As these representations are
high-dimensional, out-of-the-box methods like KMeans do not work well. Thus,
our approach jointly transforms the representations into a lower-dimensional
cluster-friendly space and clusters them. We consider two notions of syntax:
Part of speech Induction (POSI) and constituency labelling (CoLab) in this
work. Interestingly, we find that Multilingual BERT (mBERT) contains surprising
amount of syntactic knowledge of English; possibly even as much as English BERT
(EBERT). Our model can be used as a supervision-free probe which is arguably a
less-biased way of probing. We find that unsupervised probes show benefits from
higher layers as compared to supervised probes. We further note that our
unsupervised probe utilizes EBERT and mBERT representations differently,
especially for POSI. We validate the efficacy of our probe by demonstrating its
capabilities as an unsupervised syntax induction technique. Our probe works
well for both syntactic formalisms by simply adapting the input
representations. We report competitive performance of our probe on 45-tag
English POSI, state-of-the-art performance on 12-tag POSI across 10 languages,
and competitive results on CoLab. We also perform zero-shot syntax induction on
resource impoverished languages and report strong results
How Good Is NLP? A Sober Look at NLP Tasks through the Lens of Social Impact
Recent years have seen many breakthroughs in natural language processing
(NLP), transitioning it from a mostly theoretical field to one with many
real-world applications. Noting the rising number of applications of other
machine learning and AI techniques with pervasive societal impact, we
anticipate the rising importance of developing NLP technologies for social
good. Inspired by theories in moral philosophy and global priorities research,
we aim to promote a guideline for social good in the context of NLP. We lay the
foundations via the moral philosophy definition of social good, propose a
framework to evaluate the direct and indirect real-world impact of NLP tasks,
and adopt the methodology of global priorities research to identify priority
causes for NLP research. Finally, we use our theoretical framework to provide
some practical guidelines for future NLP research for social good. Our data and
code are available at http://github.com/zhijing-jin/nlp4sg_acl2021. In
addition, we curate a list of papers and resources on NLP for social good at
https://github.com/zhijing-jin/NLP4SocialGood_Papers.Comment: Findings of ACL 2021; also accepted at the NLP for Positive Impact
workshop@ACL 202
When does aggregating multiple skills with multi-task learning work? A case study in financial NLP
Multi-task learning (MTL) aims at achieving a better model by leveraging data and knowledge from multiple tasks. However, MTL does not always work – sometimes negative transfer occurs between tasks, especially when aggregating loosely related skills, leaving it an open question when MTL works. Previous studies show that MTL performance can be improved by algorithmic tricks. However, what tasks and skills should be included is less well explored. In this work, we conduct a case study in Financial NLP where multiple datasets exist for skills relevant to the domain, such as numeric reasoning and sentiment analysis. Due to the task difficulty and data scarcity in the Financial NLP domain, we explore when aggregating such diverse skills from multiple datasets with MTL can work. Our findings suggest that the key to MTL success lies in skill diversity, relatedness between tasks, and choice of aggregation size and shared capacity. Specifically, MTL works well when tasks are diverse but related, and when the size of the task aggregation and the shared capacity of the model are balanced to avoid overwhelming certain tasks
Navigating the Ocean of Biases: Political Bias Attribution in Language Models via Causal Structures
The rapid advancement of Large Language Models (LLMs) has sparked intense
debate regarding their ability to perceive and interpret complex
socio-political landscapes. In this study, we undertake an exploration of
decision-making processes and inherent biases within LLMs, exemplified by
ChatGPT, specifically contextualizing our analysis within political debates. We
aim not to critique or validate LLMs' values, but rather to discern how they
interpret and adjudicate "good arguments." By applying Activity Dependency
Networks (ADNs), we extract the LLMs' implicit criteria for such assessments
and illustrate how normative values influence these perceptions. We discuss the
consequences of our findings for human-AI alignment and bias mitigation. Our
code and data at https://github.com/david-jenny/LLM-Political-Study
- …